Tomographic reconstruction system

ABSTRACT

A tomography system having a central processing unit, a system memory communicatively connected to the central processing unit, and a hardware acceleration unit communicatively connected to the central processing unit and the system memory, the hardware accelerator configured to perform at least a portion of an MBIR process on computer tomography data. The hardware accelerator unit may include one or more voxel evaluation modules which evaluate an updated value of a voxel given a voxel location in a reconstructed volume. By processing voxel data for voxels in a voxel neighborhood, processing time is reduces.

RELATED APPLICATIONS

The present patent application is a continuation of U.S. patentapplication Ser. No. 15/063,054, filed Mar. 7, 2016, which claims thepriority benefit of U.S. Provisional Patent Application Ser. No.62/129,018, filed Mar. 5, 2015. The contents of both of the theseapplications is hereby incorporated by reference in its entirety intothe present disclosure.

TECHNICAL FIELD

The present application relates to tomography imaging devices, e.g.,computed tomography devices, and to tomography control systems.

BACKGROUND

Tomographic reconstruction is an important inverse problem in a widerange of imaging systems, including medical scanners, explosivedetection systems and electron and X-ray microscopy for scientific andmaterials imaging. The objective of tomographic reconstruction is tocompute a three-dimensional volume (a physical object or a scene) fromtwo-dimensional observations that are acquired using an imaging system.An example of tomographic reconstruction is found in computed tomography(CT) scans, in which X-ray radiation is passed from several angles torecord 2D radiographic images of specific parts of the scanned patient.These radiographic images are then processed using a reconstructionalgorithm to form a 3D volumetric view of the scanned region, which issubsequently used for medical diagnosis.

Model Based Iterative Reconstruction (MBIR) is a promising approach torealize tomographic reconstruction. The MBIR framework formulates theproblem of reconstruction as minimization of a high-dimensional costfunction, in which each voxel1 in the 3D volume is a variable. Aniterative algorithm is employed to optimize the cost function such thata pre-specified error threshold is met.

MBIR has demonstrated state-of-the art reconstruction quality on variousapplications and has been utilized commercially in GE's healthcaresystems. In addition to improved image quality, MBIR has enabledsignificant reduction in X-ray dosage in the context of lung cancerscreening (˜80% reduction) and pediatric imaging (30-50% reduction). Inother application domains, MBIR offers additional advantages such asimproved output resolution, precise definition with reduced impact ofundesired artifacts in images, and the ability to reconstruct even withsparse view angles. These capabilities are extremely critical inapplications such as explosive detection systems (e.g. baggage and cargoscanners), where there is a need to reduce cost due to false alarmrates, operate under non-ideal view angles, and extend deployed systemsto cover new threat scenarios.

While MBIR shows great potential, its high compute and data requirementsare key bottlenecks to its widespread commercial adoption. For instance,reconstructing a 512×512×256 volume of nanoparticles viewed fromdifferent angles through an electron microscope requires 50.33 GOPS(Giga operations) and 15G memory accesses per iteration of MBIR.Further, the algorithm may take 10 s of iterations to converge dependingon the threshold. Clearly, this places significant compute demand. Onetested software implementation required ˜1700 seconds per iteration on a2.3 GHz AMD Opteron server with 196 GB memory, which is unacceptable formany practical applications. Thus, technologies that enable orders ofmagnitude improvement in MBIR's implementation efficiency are needed.

SUMMARY

According to various aspects, a tomography system is provided,comprising a central processing unit, a system memory communicativelyconnected to the central processing unit and a hardware accelerationunit communicatively connected to the central processing unit and thesystem memory, the hardware accelerator configured to perform at least aportion of an MBIR process on computer tomography data. The system maycomprise one or more voxel evaluation modules that evaluates an updatedvalue of a voxel given a voxel location in a reconstructed volume. Theoperations may further include determining a reconstructed image for theselected voxel using the updated value of the voxel. The system may alsoinclude a computer tomography scanner, wherein the computer tomographyscanner is configured to irradiate a test object, measure resultingradiation, and provide measured data corresponding to the resultingradiation.

BRIEF DESCRIPTION OF THE DRAWINGS

In the following description and drawings, identical reference numeralshave been used, where possible, to designate identical features that arecommon to the drawings.

FIG. 1 is a diagram showing the components of an example tomographysystem according to one embodiment.

FIG. 2 is a diagram showing an access pattern for voxel update in anMBIR algorithm useful with various aspects.

FIG. 3 is a block diagram of an example implementation of hardwarespecialized to execute the MBIR algorithm useful with various aspects.

FIG. 4 is a block diagram of a computation engine/Voxel EvaluationModule of FIG. 3 used for voxel update.

FIG. 5 shows a scheme where constraining voxels on a x-z line enablessharing of A-Matrix Column.

FIG. 6 shows a scheme where a neighborhood is reused if each computationengine is designed to update a volume around the selected voxel.

The attached drawings are for purposes of illustration and are notnecessarily to scale.

DETAILED DESCRIPTION

X-ray computed tomography, positron-emission tomography, and othertomography imaging systems are referred to herein generically as “CT”systems.

Throughout this description, some aspects are described in terms thatwould ordinarily be implemented as software programs. Those skilled inthe art will readily recognize that the equivalent of such software canalso be constructed in hardware, firmware, or micro-code. Becausedata-manipulation algorithms and systems are well known, the presentdescription is directed in particular to algorithms and systems formingpart of, or cooperating more directly with, systems and methodsdescribed herein. Other aspects of such algorithms and systems, andhardware or software for producing and otherwise processing signals ordata involved therewith, not specifically shown or described herein, areselected from such systems, algorithms, components, and elements knownin the art. Given the systems and methods as described herein, softwarenot specifically shown, suggested, or described herein that is usefulfor implementation of any aspect is conventional and within the ordinaryskill in such arts.

FIG. 1 shows a tomography system 100 according to one embodiment. Asshown, the system 100 includes a CT scanner 102, a central processingunit 104, a system memory 106, a tomography hardware accelerator unit302, and a user interface 108. The CT scanner 102 may include a rotatinggantry having an x-ray radiation source and sensors which directradiation to an object being scanned at various angles to record 2Dradiographic images. The various components shown in FIG. 1 may becommunicatively connected by an electronic network.

Central processing unit 104, hardware accelerator unit 302, and otherprocessors described herein, can each include one or moremicroprocessors, microcontrollers, field-programmable gate arrays(FPGAs), application-specific integrated circuits (ASICs), programmablelogic devices (PLDs), programmable logic arrays (PLAs), programmablearray logic devices (PALs), or digital signal processors (DSPs).

System memory 106 can be a tangible non-transitory computer-readablestorage medium, i.e., a non-transitory device or article of manufacturethat participates in storing instructions that can be provided tocentral processing unit 104 or hardware accelerator unit 302 forexecution. In one example, the system memory 106 comprises random accessmemory (RAM). In other examples, the system memory 106 may comprise ahard disk drive.

The phrase “communicatively connected” includes any type of connection,wired or wireless, for communicating data between devices or processors.These devices or processors can be located in physical proximity or not.

To better illustrate the technical challenges involved in theimplementation of MBIR, an explanation of the mathematical conceptsbehind the algorithm will be provided. In order to describe the MBIRapproach, it is useful to think of all the 2D images (measurements) aswell as the unknown 3D volume of voxels as one-dimensional vectors. If yis a M×1 vector containing all the measurements, x is a N×1 vectorcontaining all the voxels in the 3D volume, and A is a sparse M×N matriximplementing the line integral through the 3D volume, then the MBIRreconstruction is obtained by minimizing the following function,

$\begin{matrix}{{c(x)} = {{\frac{1}{2}{{y - {Ax}}}_{\Lambda}^{2}} + {\beta {\sum\limits_{r,{s\epsilon N}_{s}}{w_{rs}{\rho \left( {x_{s} - x_{r}} \right)}}}}}} & (1)\end{matrix}$

where N represents the set of all pairs of neighboring voxels in 3D(using say a 26 point neighborhood system), ρ(.) is a potential functionthat incorporates a model for the underlying image, A is a diagonalmatrix whose entries weight each term by a factor inversely proportionalto the noise in the measurement, and ω_(rs) is a set of normalizedweights depending on the physical distance between neighboring voxels.The first term in equation (1) has the interpretation of enforcingconsistency of the desired reconstruction with the measurements whilethe second term enforces certain desirable characteristics in thereconstruction (sharp edges, low-noise etc.). The term y-Ax, whichrepresents the difference between the original 2D measurements and the2D projections obtained from the 3D volume, is called the error sinogram(e).

While several variants of the MBIR algorithm exist based on how the costfunction is minimized, a popular variant called the Iterative CoordinateDescent MBIR (ICD-MBIR) is considered. The basic idea in ICD is toupdate the voxels one at a time so as to monotonically decrease thevalue of the original function (equation (1)) with each update. Sincethe cost function is convex and is bounded from below, this methodconverges to the global minimum.

The cost function in equation (1) with respect to a single voxel(ignoring constants) indexed by s is given by

$\begin{matrix}{{c_{s}(z)} = {{\theta_{1}z} + {\frac{\theta_{2}}{2}\left( {z - x_{s}} \right)^{2}} + {\sum\limits_{r \in N_{a}}{w_{rs}{\rho \left( {z - x_{r}} \right)}}}}} & (2) \\{\theta_{1} = {{{- e^{t}}\Lambda \; A_{*{,s,}}\theta_{2}} = {A_{*{,s}}^{t}\Lambda \; A_{s,*}}}} & (3)\end{matrix}$

where A_(*,s) is the s^(th) column of A, e=y−Ax and x_(s) is the currentvalue of the voxel s.

Due to the complicated nature of the function ρ( ) it is typically notpossible to find a simple closed form expression for the minimum of (2).Hence, ( ) often replaced by a quadratic surrogate function which makes(2) simpler to minimize. In particular if

$\begin{matrix}{a_{rs} = \left\{ \begin{matrix}\frac{\rho^{\prime}\left( {x_{r} - x_{s}} \right)}{\left( {x_{r} - x_{s}} \right)} & {x_{s} \neq x_{r}} \\{p^{''}(0)} & {x_{s} = x_{r}}\end{matrix} \right.} & (4)\end{matrix}$

then an overall surrogate function to (2) is given by

$\begin{matrix}{{c_{s}(z)} = {{\theta_{1}\left( {z - x_{s}} \right)} + {\theta_{2}\left( {z - x_{s}} \right)}^{2} + {\sum\limits_{r \in N_{s}}{w_{rs}{a_{rs}\left( {z - x_{r}} \right)}^{2}}}}} & (5)\end{matrix}$

Taking the derivative of this surrogate function and setting it to zero,it can be verified that the minimum of the function is

$\begin{matrix}\left. z^{*}\leftarrow{\frac{{\theta_{2}x_{s}} - \theta_{1} + {\sum\limits_{r \in N_{s}}{w_{rs}a_{rs}x_{r}}}}{\theta_{2} + {\sum\limits_{r \in N_{s}}{w_{rs}a_{rs}}}}.} \right. & (6)\end{matrix}$

Note that minimizing (5) ensures a decrease of (2) and hence that of theoriginal function (1). The algorithm can be efficiently implemented bykeeping track of the error sinogram e along with each update.

Steps of an MBIR process according to one embodiment are summarized inTable 1 below.

TABLE 1 Input: 2D Measurements: y Output: Reconstructed 3D volume: x  1:Initialize x at random  2: Error Sinogram: e = y − Ax  3: whileConvergence criteria not met do  4: for each voxel v in random order do 5: θ₁ and θ₂ = f(e, A_(*,v))  6: for voxels u ∈ neighborhood 

 _(v) of v do  7: Compute surrogate fn. a_(uv) for u (Eqn. 5)  8: endfor  9: Compute z* = g(θ₁, θ₂, x_(v), a_(*v)) (Eqn. 7) 10: Update ErrorSinogram: e ← e − (z* − x_(v))A_(*,v) 11: Update voxel: x_(v) ← z* 12:end for 13: end while

Given a set of 2D measurements (y) as inputs, the process produces thereconstructed 3D volume (x) at the output. First, the voxels in x areinitialized at random (line 1). Next the error sinogram (e) is computedas the difference between the 2D measurements and the 2D views obtainedby projecting the current 3D volume. The process iteratively updates thevoxels (lines 4-13) until the convergence criteria is met. In eachiteration, every voxel in the volume is updated once in random order.

Lines 5-11 of the process in Table 1 describe the steps involved inupdating a voxel. First, the parameters θ₁ and θ₂ are computed using theA matrix and the error sinogram e (line 5). Next, the quadraticsurrogate function is evaluated for each of the voxel neighbors (lines6-8). These are utilized to compute the new value of the voxel z* (line9). The error sinogram (e) and the 3D volume (x) are then updated thenew voxel value (lines 10-11).

ICD-MBIR offers several advantages over other MBIR variants: (i) Ittakes lower number of iterations to converge, thereby enabling fasterruntimes, and (ii) It is general and can be easily adopted for a varietyof applications with different geometries, noise statistics and imagemodels, without the need for custom algorithmic tuning for eachapplication.

However, a key challenge with ICD-MBIR is that it is not easilyamendable to efficient parallel execution on modern multi-cores andmany-core accelerators such as general purpose graphical processingunits (GPGPUs) for the following reasons. First, there is limited dataparallelism within the core computations that evaluate the updated valueof the voxel. From a computational standpoint, each voxel update tocreate a 3D image 208 involves accessing 3 key data-structures asillustrated in FIG. 2: (i) a column 212 of the A matrix 210, wherein thecolumn 212 is indexed by the x and z co-ordinates of a voxel 214, (ii)voxel neighborhood 216, which refers to the voxels adjacent to thecurrent voxel 214 along all directions, and (iii) portions of the errorsinogram, which is determined by slice ID or y co-ordinate of the voxel214. Since the A matrix column 212 is typically sparse (sparsity ratioof 1000:1), the per-voxel update computations are relatively small(Time/Voxel-update: ˜26 μs), and the overheads of parallelization suchas task startup time, synchronization between threads, and off-chipmemory bandwidth significantly limit performance. In summary,parallelizing computations within each voxel update yields very littleperformance improvement.

The present disclosure provides a specialized hardware architecture andassociated control system to simultaneously improve both the runtime andenergy consumption of the MBIR algorithm by exploiting its computationalcharacteristics.

FIG. 3 shows a block diagram of the tomography hardware accelerator unit302 according to one embodiment. The tomography hardware acceleratorunit 302 receives as input: 2D measurement data 304, reconstructed 3Dvolume data 306, A matrix data 308, and error sinogram data 310. Incertain embodiments, the 2D measurement data 304, reconstructed 3Dvolume data 306, A matrix data 308, and error sinogram 310 may be storedin memory blocks external to and operatively connected to the tomographyhardware accelerator unit 302. The tomography hardware accelerator unit302 may also include a global control unit 312 containing appropriatecontrol registers and logic to initialize the location of the externalmemory blocks and generate interface signals 314 for sending/receivinginputs/outputs to and from the tomography hardware accelerator unit 302.

At a high level, operation of the tomography hardware accelerator unit302 can be summarized as follows. First, the global control unit 312generates a random voxel ID (x,y,z co-ordinates). Based on theco-ordinates, the tomography hardware accelerator unit 302 retrieves thefollowing data from the system memory 106 which is required to updatethe voxel 214: a column 212 of the A matrix 210, a portion of the errorsinogram 310, and the voxel neighbor data 216 (which is a portion of thevolume data 306). The tomography hardware accelerator unit 302 mayinclude internal memory blocks to store these data structures. Theupdated value of the voxel 214 is then computed by the tomographyhardware accelerator unit 302 and stored back to the external systemmemory 106. This process is repeated until the convergence criterion ismet.

The tomography hardware accelerator unit 302 may also comprise one ormore voxel evaluation modules 316. Each voxel evaluation module 316 maycomprise a theta evaluation module 318, a neighborhood processingelement 320, and a voxel update element 322. Each of the thetaevaluation module 318, neighborhood processing element 320, and voxelupdate element 322 may comprise one or more computer processors andassociated memory for performing calculations on the received data.

The TEM 318 evaluates the variables θ₁ and θ₂ of the MBIR algorithm. TheNPE 320 applies a complex one-to-one function on each of the neighborvoxels. The VUE 322 uses the outputs of TEM and NPE to compute theupdated value of the voxel 214 and the error sinogram 310. Theprocessing elements 318, 320 and 322 may comprise hardware functionalblocks such as adders, multipliers, registers etc. that areinterconnected to achieve the desired functionality, in some cases, overmultiple cycles of operation. The VEM 316 may also contain memory blocksthat store the column 212 of A matrix 210, portions of the errorsinogram 310 and the voxel neighbor data, all of which may be accessedby the TEM 318, NPE 320 and VUE 322.. Since the A matrix is sparse, itmay be stored as an adjacency list using First-In-First-Out (FIFO)buffers in certain embodiments. A controller present within the engineis designed to fetch the necessary data if it is not already availablein the internal memory blocks of the VEM 316.

The VEM 316 operates as follows. First, the elements of the A matrixcolumn 212 are transferred into the TEM 318. The TEM 318 utilizes theindex of the A matrix elements to address the error sinogram 310 memoryto obtain the corresponding error sinogram 310 value. The TEM 318performs a vector reduction operation on the A matrix 212 and errorsinogram 310 values to obtain θ₁ and θ₂. In parallel to the TEM 318, theNPE 320 operates on each of the voxel's neighbors data and stores theprocessed neighbor values in a FIFO memory located in the VEM 316. Sincethe TEM 318 and NPE 216 operate in parallel, the performance of the VEM316 is maximized when their latencies are equal. This is achieved byproportionately allocating hardware resources in their implementation.The output of the TEM 318 and NPE 320 is directed to the VUE 322, whichcomputes the updated value of the voxel 214. This involves performing avector reduction operation on the voxel neighborhood 216, followed bymultiple scalar operations. The entries in the error sinogram 310 memoryare also updated based on the updated value of the voxel 214. Finally,the voxel 214 data is written back to the system memory 106. Thus, theVEM 316 efficiently computes the updated value of a voxel.

In certain embodiments, the performance of the computation engine can befurther improved by operating it as a two-level nested pipeline. Thefirst-level pipeline is within the TEM 318. In this case, the VEM 316leverages the pipeline parallelism across the different elements of thevector reduction. When the TEM 318 computes on a given A matrix element,the error sinogram value for the successive element is fetched from theerror sinogram memory in a pipelined manner. The second level pipeliningexploits the parallelism across successive voxels. In this case, the VEM316 concurrently transfers data required by the subsequent voxel, evenas the previous voxel is being processed by the VEM 316. Thus, bothpipeline levels improve performance by overlapping data communicationwith computation.

Each execution of the VEM 316 requires the A matrix column 212, theerror sinogram 310 and the voxel neighbor data 216 to be transferredfrom the system memory 106 to the VEM 316. To minimize data transferoverhead, in certain embodiments, the VEM 316 reuses the data stored inthe internal memory blocks of the VEM 316 across multiple voxels. Sincevoxels in a slice 218 share the same portion of the error sinogram 310(FIG. 2), the VEM 316 constrains the sequence in which the voxels 214are updated in the VEM as follows: First, a slice 218 is selected fromthe volume 208 at random. Then, all voxels 214 in the slice 218 areupdated in a random sequence before the next slice is chosen. In thiscase, the error sinogram 310 needs to be fetched only once per slice,and all voxels within the slice can re-use the data. Thus the datatransfer cost for the error sinogram 310 is amortized across all voxels214 in the slice 218. This optimization can be simply realized bymodifying the global control unit 312 that generates the voxel ID. Inother words, the global control unit 312 may be programmed to selectvoxels for updating in an order that processes a majority of voxels fromthe same slice together.

In certain embodiments, the VEMs 316 are arranged as an array of Llanes, each containing a dedicated TEM 318, NPE 318, and VUE 322. Toensure convergence of the MBIR algorithm, the voxels that are to beupdated in parallel are chosen to be located far apart in the 3D volume.To maximize the distance of separation, the voxels may be selected fromdifferent slices 218 that are equally and entirely spread out acrossthey dimension (FIG. 2) of the 3D volume 208.

In certain embodiments, as shown in FIG. 5, the tomography hardwareaccelerator unit 302 restricts concurrently updated voxels to lie on astraight line in the volume. In the embodiment illustrated in FIG. 5,the line 502 is parallel to the y axis of the volume 208 (i.e. they havethe same x,z coordinates), although any straight line in the volume maybe used. Since A matrix columns are indexed only using the x and zco-ordinates (FIG. 2), concurrently updated voxels 214 share the same Amatrix column 212, thereby linearly reducing the A matrix data transfertime. This optimization does not impact convergence, as the slices fromwhich the voxels 214 are picked lie sufficiently far apart (the ydimension of the volume is much larger than the number of parallel voxelupdates). As shown in FIG. 3, an A matrix memory 324 may be placedoutside the VEMs 316 and shared by all VEMs 316 during operation. Beforeeach execution of the VEM array, the neighborhoods for all voxels 214and one A matrix column 212 are transferred from the system memory 106to the memory 324. This results in a net reduction of L−1 A matrixcolumn transfers per execution, where L is the number of VEMs 316 in thetomography hardware accelerator unit 302.

In certain embodiments, the neighborhood voxel data transfer time may bereduced by concurrently updating the neighborhood volume 216 of sizea×b×c (x,y,z directions) around the voxel 214. Each VEM 316 is then usedto update one of the voxels 214 in the volume 216 in a parallel fashion.To facilitate the neighbor voxel reuse, in certain embodiments, thecapacity of the neighborhood memory is sized to hold all of the data forvoxels in the neighborhood volume 216. Also, once the updated value of avoxel is computed, it needs to be written to the neighborhood memory 216within the VEM 316 (in addition to the system memory 106), as subsequentvoxels in the volume use the updated value. Also, the since the adjacentvoxels along the y direction belong to different slices, the errorsinogram memory in the VEM 316 may be replicated to hold datacorresponding to each slice 218 in the volume 216. Along similar lines,voxels within the slice 218 use different A matrix columns, andcorrespondingly the A matrix memory is also replicated. The TEM 318, NPE320 and VUE 322 are not replicated in the VEM 316 in such embodiments,as voxels in the volume 216 are evaluated sequentially. Finally, theglobal control unit 312 and VEMs 316 are modified to appropriately indexthese memories and evaluate all voxels within the neighborhood volume216.

Steps of various methods described herein can be performed in any orderexcept when otherwise specified, or when data from an earlier step isused in a later step. Exemplary method(s) described herein are notlimited to being carried out by components particularly identified indiscussions of those methods.

Various aspects provide more effective processing of CT data. Atechnical effect is to improve the functioning of a CT scanner bysubstantially reducing the time required to process CT data, e.g., byperforming MBIR or other processes to determine voxel values. A furthertechnical effect is to transform measured data from a CT scanner intovoxel data corresponding to the scanned object.

Various aspects described herein may be embodied as systems or methods.Accordingly, various aspects herein may take the form of an entirelyhardware aspect, an entirely software aspect (including firmware,resident software, micro-code, etc.), or an aspect combining softwareand hardware aspects These aspects can all generally be referred toherein as a “service,” “circuit,” “circuitry,” “module,” or “system.”

Furthermore, various aspects herein may be embodied as computer programproducts including computer readable program code (“program code”)stored on a computer readable medium, e.g., a tangible non-transitorycomputer storage medium or a communication medium. A computer storagemedium can include tangible storage units such as volatile memory,nonvolatile memory, or other persistent or auxiliary computer storagemedia, removable and non-removable computer storage media implemented inany method or technology for storage of information such ascomputer-readable instructions, data structures, program modules, orother data. A computer storage medium can be manufactured as isconventional for such articles, e.g., by pressing a CD-ROM orelectronically writing data into a Flash memory. In contrast to computerstorage media, communication media may embody computer-readableinstructions, data structures, program modules, or other data in amodulated data signal, such as a carrier wave or other transmissionmechanism. As defined herein, “computer storage media” do not includecommunication media. That is, computer storage media do not includecommunications media consisting solely of a modulated data signal, acarrier wave, or a propagated signal, per se.

The program code can include computer program instructions that can beloaded into processor 186 (and possibly also other processors), andthat, when loaded into processor 486, cause functions, acts, oroperational steps of various aspects herein to be performed by processor186 (or other processor). The program code for carrying out operationsfor various aspects described herein may be written in any combinationof one or more programming language(s), and can be loaded from disk 143into code memory 141 for execution. The program code may execute, e.g.,entirely on processor 186, partly on processor 186 and partly on aremote computer connected to network 150, or entirely on the remotecomputer.

The invention is inclusive of combinations of the aspects describedherein. References to “a particular aspect” (or “embodiment” or“version”) and the like refer to features that are present in at leastone aspect of the invention. Separate references to “an aspect” (or“embodiment”) or “particular aspects” or the like do not necessarilyrefer to the same aspect or aspects; however, such aspects are notmutually exclusive, unless otherwise explicitly noted. The use ofsingular or plural in referring to “method” or “methods” and the like isnot limiting. The word “or” is used in this disclosure in anon-exclusive sense, unless otherwise explicitly noted.

The invention has been described in detail with particular reference tocertain preferred aspects thereof, but it will be understood thatvariations, combinations, and modifications can be effected within thespirit and scope of the invention.

1. A tomography system, comprising: a central processing unit; a systemmemory communicatively connected to the central processing unit; and ahardware acceleration unit communicatively connected to the centralprocessing unit and the system memory, the hardware acceleratorconfigured to perform at least a portion of an MBIR process on computertomography data.
 2. The system according to claim 1, further comprisingone or more voxel evaluation modules that evaluates an updated value ofa voxel given a voxel location in a reconstructed volume.
 3. The systemaccording to claim 1, further comprising an electronic display, thedisplay operatively connected to the central processing unit, theelectronic display configured to display a reconstructed image based onthe computer tomography data.
 4. The system according to claim 1,further comprising a computer tomography scanner, wherein the computertomography scanner is configured to irradiate a test object, measureresulting radiation, and provide measured data corresponding to theresulting radiation.
 5. The system according to claim 1, whereinhardware acceleration unit comprises a control unit which generates apseudo-random sequence of voxel locations.
 6. The system of claim 1,wherein the hardware acceleration unit is configured to identify voxeldata required to update a voxel and fetch the voxel data from the systemmemory.
 7. The system of claim 2, wherein the VEM contains VEM memoryblocks internal to the VEM, which stores the data needed to compute theupdated voxel value.
 8. The system of claim 2, wherein the VEM isconfigured to assess the data stored in the VEM memory blocks, re-usesaid data stored in the VEM memory blocks across multiple voxelevaluations, and partially fetch unavailable data from the systemmemory.
 9. The system of claim 2, wherein the VEM is configured toperform data transfer operations and data processing operations inparallel.
 10. The system of claim 2 wherein the VEM is configured toperform data transfer operations and data processing operations in apipelined manner.
 11. The system of claim 2, wherein the hardwareaccelerator unit fetches data required for a voxel from the systemmemory, while computations corresponding to a different voxel are inprogress.
 12. The system of claim 2, wherein the hardware acceleratorunit comprises a plurality of VEMs, the VEMs configured to updatemultiple voxels in parallel.
 13. The system of claim 12, wherein thesequence of voxels updated on a VEM is constrained to enhance data reusewithin the accelerator.
 14. The system of claim 12 in which at least onenext voxel processed on a given VEM is constrained to lie within acommon slice as the previous voxel processed on the VEM, therebyenabling the error sinogram memory to be reused.
 15. The system of claim12 in which voxels updated concurrently on multiple VEMs are constrainedsuch that they share at least one entry of an A matrix of the tomographydata.
 16. The system of claim 15 where the said shared entry of the Amatrix is fetched only once from the system memory and used by multipleVEMs.
 17. The system of claim 12 in which adjacent voxels are updated onthe same VEM, enabling neighborhood voxel data to be shared between thevoxels.
 18. The system of claim 12 where each VEM is configured toupdate a voxel neighborhood around a given voxel, the neighborhoodcomprising voxels adjacent to the given voxel.